Skip to content

Conversation

@asaadbalum
Copy link

@asaadbalum asaadbalum commented Nov 24, 2025

This PR implements the Istio profile for the E2E testing framework as requested in issue #656.

Istio Service Mesh Integration E2E Test Profile

🎯 Overview

Successfully implemented and validated a comprehensive E2E testing profile for Istio service mesh integration with Semantic Router. All 17 tests pass with 100% success rate in local testing.

✅ Test Results

Local Testing:

  • 17/17 tests passing (100% success rate)
  • ⏱️ Duration: 22 minutes
  • 🔧 Profile: Istio service mesh with sidecar injection
  • 📊 Coverage: 4 Istio-specific + 13 common tests

✨ Implementation

Istio-Specific Tests (4)

  1. Sidecar Health Check

    • Verifies Envoy sidecar injection in Semantic Router pods
    • Validates sidecar health and readiness (2/2 containers)
    • Confirms istio-injection=enabled namespace label
  2. Traffic Routing

    • Tests request routing through Istio ingress gateway
    • Validates Istio/Envoy response headers (Server: istio-envoy)
    • HTTP 200 OK with proper routing
  3. mTLS Verification

    • Confirms DestinationRule with ISTIO_MUTUAL mode
    • Validates Istio proxy certificates presence
    • Verifies mutual TLS between services
  4. Tracing & Observability

    • Validates Envoy metrics collection
    • Checks telemetry configuration
    • Confirms distributed tracing capabilities

Common Tests (13)

All shared tests passing:

  • Chat completions (standard + 200 sequential stress requests @ 100% success)
  • Domain classification (50.36% accuracy on MMLU dataset)
  • Semantic caching (88% hit rate)
  • PII detection (96% detection rate)
  • Jailbreak detection (32% detection rate)
  • Decision priority selection (75% accuracy)
  • Plugin chain execution (75% accuracy)
  • Rule condition logic (50% accuracy with AND/OR operators)
  • Decision fallback behavior (40% accuracy)
  • Keyword routing (45.45% accuracy)
  • Plugin config variations (50% accuracy)
  • Progressive stress (10/20/50 concurrent requests)

🏗️ Architecture

Service Mesh Components:

┌─────────────────────┐
│ Istio Control Plane │
│ (istiod) │
└──────────┬──────────┘

┌──────┴──────┐
│ Istio │
│ Ingress │
│ Gateway │
└──────┬──────┘

┌──────┴────────────┐
│ Semantic Router │
│ + Envoy Sidecar │
│ (vllm-semantic- │
│ router-system) │
└──────┬────────────┘

┌──────┴──────────┐
│ Envoy Gateway │
│ (ExtProc) │
└──────┬──────────┘

┌──────┴───────────┐
│ Envoy AI Gateway │
│ (CRDs for service│
│ discovery) │
└──────┬───────────┘

┌──────┴──────────┐
│ vLLM Backend │
│ (via Gateway │
│ API resources)│
└─────────────────┘

Key Design Decisions:

1. Hybrid Architecture (Istio + Envoy Gateway)

  • Istio: Provides service mesh capabilities (mTLS, observability, traffic management)
  • Envoy Gateway: Handles ExtProc communication with Semantic Router
  • Rationale: Semantic Router requires ExtProc calls from Envoy proxy; Istio Ingress Gateway alone doesn't provide this

2. Dynamic Service Discovery

  • Uses AIServiceBackend CRDs (via Envoy AI Gateway)
  • No hardcoded IPs or manual ClusterIP injection
  • Scalable, production-ready architecture

3. Namespace Configuration

  • Deploys to vllm-semantic-router-system (consistent with other profiles)
  • Istio-specific tests hardcoded to use correct namespace
  • No framework modifications required

🎁 Additional Enhancements

Keyword Routing Support ⚠️ Shared Enhancement

Added keyword-based routing decisions to e2e/profiles/ai-gateway/values.yaml:

New Decisions:

  • urgent_request (priority 30)

    • Detects: urgent, immediate, asap, emergency (OR logic)
    • Use case: Fast-track urgent requests
  • sensitive_data (priority 40)

    • Detects: SSN AND credit card (AND logic)
    • Use case: Security-sensitive queries

Impact: This enhancement benefits ALL profiles that include the keyword-routing test. It's a shared configuration improvement, not Istio-specific.

Rationale: The keyword-routing common test was failing because keyword routing rules were missing from the configuration. These additions enable the test to pass across all profiles that use ai-gateway/values.yaml.

📊 Detailed Test Metrics

Test Category Metric Value
Overall Success Rate 100%
Total Tests 17
Duration 22m 2s
Istio Tests Sidecar Health ✅ 1/1 pods healthy
Traffic Routing ✅ 200 OK via Istio gateway
mTLS Mode ✅ ISTIO_MUTUAL
Observability ✅ Metrics + Telemetry
Stress Tests Sequential (200) 100% success, 447ms avg
Concurrent (50) 100% success
Detection PII Detection 96%
Jailbreak Detection 32%
Semantic Cache Hit 88%
Classification Domain Accuracy 50.36% (MMLU)
Keyword Accuracy 45.45%

🔧 Technical Implementation

Deployment Sequence:

  1. Install Istio control plane (base + istiod + ingress gateway)
  2. Configure namespace with istio-injection=enabled label
  3. Deploy Semantic Router (auto-injects Envoy sidecar)
  4. Deploy Envoy Gateway for Gateway API support
  5. Deploy Envoy AI Gateway for AIServiceBackend CRDs
  6. Deploy vLLM backend via Gateway API resources
  7. Create Istio traffic management resources (Gateway, VirtualService, DestinationRule)
  8. Verify all components and run tests

Configuration Details:

  • Values File: e2e/profiles/ai-gateway/values.yaml
  • Namespace: vllm-semantic-router-system
  • Sidecar Injection: Automatic via namespace labeling
  • Helm Timeout: 30 minutes (for model downloads)
  • CI Timeout: 75 minutes (shared across all profiles)
  • Image Pull Policy: Never (uses local images in CI)

Files Modified:

  • e2e/profiles/istio/profile.go (322 insertions) - Main profile implementation
  • e2e/profiles/ai-gateway/values.yaml (54 insertions) - Keyword routing support
  • e2e/testcases/istio_*.go (4 files, 24 insertions) - Namespace fixes

Total: 6 files, 336 insertions(+), 72 deletions(-)

✅ Pre-commit & CI Validation

Local Validation:

  • ✅ All pre-commit hooks passed
  • ✅ YAML linting (fixed trailing spaces)
  • ✅ Go formatting
  • ✅ Trailing whitespace removed
  • ✅ DCO signoff present
  • ✅ No linter errors

CI Integration:

  • Added to .github/workflows/integration-test-k8s.yml
  • Runs alongside: ai-gateway, aibrix, routing-strategies, llm-d
  • Expected duration: ~22 minutes (well within 75-minute timeout)

📚 References

  • Local Testing: 17/17 tests passing (100%)
  • Test Report: Available in test run artifacts
  • Configuration: Follows existing profile patterns
  • Compatibility: Tested with Istio 1.28.0 (default)

FIX #656

@netlify
Copy link

netlify bot commented Nov 24, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 79ccdbb
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/692d94394b98470008667cf4
😎 Deploy Preview https://deploy-preview-728--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/profiles/istio/profile.go
  • e2e/testcases/istio_mtls_verification.go
  • e2e/testcases/istio_sidecar_health_check.go
  • e2e/testcases/istio_tracing_observability.go
  • e2e/testcases/istio_traffic_routing.go
  • e2e/README.md
  • e2e/cmd/e2e/main.go
  • e2e/pkg/helpers/kubernetes.go
  • e2e/profiles/ai-gateway/values.yaml
  • e2e/testcases/common.go

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/integration-test-k8s.yml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/e2e.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Nov 24, 2025

@srampal PTAL, thanks

@asaadbalum
Copy link
Author

asaadbalum commented Nov 24, 2025

Hello Reviewers.
Bare in mind that the current implementation installs the Istio binary at the ci workflow step, I'm considering a Helm-Based Installation during the Istio setup() phase.
Edit: Done

@asaadbalum asaadbalum force-pushed the istio_profile branch 2 times, most recently from c71d529 to bccdccf Compare November 24, 2025 15:51
Copy link
Member

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially i think this is good to go, separetly with istio specific cases but still i think we need to reuse the testcases to make sure istio gateway with vsr functionality work well

@asaadbalum asaadbalum force-pushed the istio_profile branch 6 times, most recently from 8002d08 to 0ebd1bb Compare December 1, 2025 11:50
Implement comprehensive E2E testing profile for Istio service mesh integration with Semantic Router:

- Add Istio profile with 4 Istio-specific tests and 13 common tests (17 total)
- Deploy Semantic Router with Istio sidecar injection and service mesh features
- Integrate Envoy Gateway for ExtProc communication alongside Istio mesh capabilities
- Deploy vLLM backend via Gateway API resources with AIServiceBackend CRDs
- Add keyword routing support (urgent_request and sensitive_data decisions)
- Fix Istio test namespace resolution to use vllm-semantic-router-system
- All 17 tests passing with 100% success rate in local testing

Test coverage includes:
- Istio sidecar injection and health verification
- Traffic routing through Istio ingress gateway
- mTLS verification between services
- Distributed tracing and observability
- Chat completions, stress tests, and domain classification
- Plugin chain execution, PII/jailbreak detection, semantic caching

Signed-off-by: Asaad Balum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2E] Add Istio profile for E2E testing framework

3 participants